Create an interactive Age-Sex Pyramid for Singapore Population Demographics

To see the changes in demographic structure in Singapore Population based on Age Group and Sex through 2000 to 2020 based on Planning Area

Tan Yan Ru (MITB, Singapore Management University)
2022-02-04

1. Introduction

In the last take-home exercise, an age-sex pyramid was built by grouping the population count by sex and age group for Year 2021 and the plot was a static plot with no interaction or animation. For this take-home exercise, the datasets used will be the Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling 2000-2010 and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling 2011-2020 to show interactivity and animation of the plots. For interactivity, the plan is to add tooltips that will show more information as user hovers the mouse above each bar chart. As for animation, the chart will show the changes that have been occurring with the population count for each Age Group Sex at the Planning Area level through the years 2000 to 2020.

1.1 Sketch of Charts to be done

The age-sex pyramid for each planning area will be done using the facet_wrap() function in ggplot to be able to arrange the plots nicely and to be able to link them to a slider where user will be able to manually control which Year to see. There will be a Play button to allow animation to run. The animation will show the changes in population demographic throughout the years 2000 to 2020.

Sketch for Age-Sex Pyramid by Planning Area

2. Data Preparation

2.1 Import all libraries required

packages = c('ggiraph', 'plotly', 
             'DT', 'patchwork',
             'gganimate', 'tidyverse',
             'readxl', 'gifski', 'gapminder')

for(p in packages){library
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Read in data in CSV

Read in both CSV files use read_csv function and use bind_rows function to combine them into one dataframe.

pop_2010 <- read_csv("data/respopagesextod2000to2010.csv", 
                     show_col_types = FALSE)

pop_2020 <- read_csv("data/respopagesextod2011to2020.csv", 
                     show_col_types = FALSE) 

pop <- bind_rows(pop_2010, pop_2020)

2.3 Check for Zero Total Population for Planning Area

As we are interested in the changes in demographic structure in Singapore at planning area level, areas with zero total population will be excluded from the plots as there is no information to be gathered from showing the plots of these planning areas. Hence, we will first check if there are any planning areas that have zero total population for all years.

check_pop <- pop %>%
  group_by('Planning Area' = pop$PA) %>%
  summarise('Total Pop' = sum(Pop)) %>%
  ungroup()

pop_eq_0 <- filter(check_pop, check_pop$`Total Pop` == 0)

pop_eq_0
# A tibble: 13 x 2
   `Planning Area`         `Total Pop`
   <chr>                         <dbl>
 1 Boon Lay                          0
 2 Boon Lay/Pioneer                  0
 3 Central Water Catchment           0
 4 Changi Bay                        0
 5 Marina East                       0
 6 Marina South                      0
 7 Paya Lebar                        0
 8 Pioneer                           0
 9 Simpang                           0
10 Straits View                      0
11 Tengah                            0
12 Tuas                              0
13 Western Islands                   0

From the above table, 13 of the planning area have zero total population count throughout the years. In this case, these planning areas will be omitted from the final table.

2.4 Get final table for population count by Age Group, Sex, Planning Area and Year

We will now get the final table by summing up the total population count by Age Group, Sex, Planning Area and Year by using the group_by() function and summarise() function. Subsequently, the 13 Planning Areas that have been highlighted above will be omitted from the final table using the filter() function to only select rows where Population Total is not zero.

pop_count <- pop %>% 
  group_by('AG' = pop$AG, 'Sex' = pop$Sex, 'Year' = pop$Time, 'PA' = pop$PA) %>%
  summarise('PopTotal' = sum(Pop)) %>%
  ungroup()

pop_count <- filter(pop_count, !pop_count$`PA` %in% pop_eq_0$`Planning Area`)

Below tables shows the remaining Planning Area that will be used.

unique(pop_count$`PA`)
 [1] "Ang Mo Kio"              "Bedok"                  
 [3] "Bishan"                  "Bukit Batok"            
 [5] "Bukit Merah"             "Bukit Panjang"          
 [7] "Bukit Timah"             "Changi"                 
 [9] "Choa Chu Kang"           "Clementi"               
[11] "Downtown Core"           "Geylang"                
[13] "Hougang"                 "Jurong East"            
[15] "Jurong West"             "Kallang"                
[17] "Lim Chu Kang"            "Mandai"                 
[19] "Marine Parade"           "Museum"                 
[21] "Newton"                  "North-Eastern Islands"  
[23] "Not Stated"              "Novena"                 
[25] "Orchard"                 "Outram"                 
[27] "Pasir Ris"               "Punggol"                
[29] "Queenstown"              "River Valley"           
[31] "Rochor"                  "Seletar"                
[33] "Sembawang"               "Sengkang"               
[35] "Serangoon"               "Singapore River"        
[37] "Southern Islands"        "Sungei Kadut"           
[39] "Tampines"                "Tanglin"                
[41] "Toa Payoh"               "Western Water Catchment"
[43] "Woodlands"               "Yishun"                 

Looking at the remaining Planning Area, we will have a look at the areas like North-Eastern Islands, Lim Chu Kang, Museum and Sungei Kadut that are known to not have many residential areas. There is also a Planning Area called ‘Not Stated’ which will be removed as it is meaningless to see the age-sex pyramid for an unknown Planning Area.

remove_PA = c('North-Eastern Islands','Not Stated','Lim Chu Kang','Museum','Southern Islands','Sungei Kadut',
              'Western Water Catchment','Changi','Seletar','Singapore River','Mandai', 'Downtown Core',
              'Orchard', 'Rochor', 'Tanglin', 'Newton', 'Outram', 'River Valley')
check_PA_others <- filter(pop_count, pop_count$PA %in% remove_PA) %>%
  group_by(Year, PA) %>%
  summarise('Total Pop' = sum(abs(PopTotal))) %>%
  ungroup()
ggplot(check_PA_others, aes(x=as.integer(Year), y=`Total Pop`)) +
  geom_line(colour = 'steelblue') +
  facet_wrap(~ PA, scales = 'free_y', ncol=4) +
  labs(title = 'Total Population vs Year', subtitle='Changes in Population Count for non-residential Planning Area',
       y = 'Total Population', x='Year')

From the above plots, as the total population for the planning areas in the list are very small as compared to the other Planning Areas, these planning areas will be removed from the final table. Orchard, Rochor, Tanglin, Newton, Outram and River Valley will also be excluded as the population density in these planning areas are relatively small.

Hence, the Planning Areas that will be used are stated below.

pop_count <- filter(pop_count, !pop_count$`PA` %in% remove_PA)

unique(pop_count$`PA`)
 [1] "Ang Mo Kio"    "Bedok"         "Bishan"        "Bukit Batok"  
 [5] "Bukit Merah"   "Bukit Panjang" "Bukit Timah"   "Choa Chu Kang"
 [9] "Clementi"      "Geylang"       "Hougang"       "Jurong East"  
[13] "Jurong West"   "Kallang"       "Marine Parade" "Novena"       
[17] "Pasir Ris"     "Punggol"       "Queenstown"    "Sembawang"    
[21] "Sengkang"      "Serangoon"     "Tampines"      "Toa Payoh"    
[25] "Woodlands"     "Yishun"       

2.5 Change all Total Population for Males to Negative values

To show the total population for Males on the left, the total population will be multiplied by -1.

pop_count$`PopTotal` <- ifelse(pop_count$Sex == 'Males', 
                                -1*pop_count$`PopTotal`, 
                                pop_count$`PopTotal`)

3. Plot the Charts

Now that the data has been prepared for plotting, I will start by creating the ggplot to show the age-sex pyramid for all planning areas. Finally, ggplotly will be used to animate the plot. It also provides the tooltips function to show the Age Group and Population Count.

As the final number of planning areas chosen is 21, I will reduce it to 10 planning areas in total, 5 mature estates and 5 non-mature estates, to see the population demographics for the two types of estates seen in Singapore. For mature estates, Ang Mo Kio, Tampines, Jurong West, Bedok and Woodlands, have been chosen and for the non-mature estates, Punggol, Seng Kang, Sembawang, Hougang and Bukit Panjang will be used.

**Note: For an age-sex pyramid, the youngest age group is at the bottom and oldest is at the top of the pyramid*

age_order <- c("0_to_4","5_to_9","10_to_14","15_to_19",
             "20_to_24","25_to_29","30_to_34","35_to_39",
             "40_to_44","45_to_49","50_to_54","55_to_59",
             "60_to_64","65_to_69","70_to_74","75_to_79",
             "80_to_84","85_to_89","90_and_over")

mature_PA = c('Ang Mo Kio', 'Tampines', 'Jurong West', 'Bedok', 'Woodlands')

non_mature_PA = c('Punggol', 'Sengkang', 'Sembawang', 'Hougang','Bukit Panjang')

3.1 Age-Sex Pyramid for Mature Planning Areas

filtered_data <- filter(pop_count, pop_count$PA %in% mature_PA)

p <-ggplot(filtered_data, aes(x=factor(AG, levels = age_order),
                      y= PopTotal,
                      fill= Sex,
                      frame = Year,
                      text = paste("Age Group: ",AG, '<br>', 'Pop Count: ', abs(PopTotal)))) +
  geom_col(data = subset(filtered_data, Sex == 'Males'), position = 'identity', fill='skyblue') +
  geom_col(data = subset(filtered_data, Sex == 'Females'), position = 'identity', fill='pink') +
  scale_y_continuous(labels=NULL) +
  scale_x_discrete(labels=NULL) +
  coord_flip() +
  facet_wrap(~PA, scales = 'free_y', ncol = 3) +
  theme(strip.background =element_rect(fill="whitesmoke", colour = 'black'),
        panel.background = element_rect(fill = "white"),
        panel.border = element_rect(colour = "black", fill=NA, size=1),
        panel.spacing.y = unit(1.5, "lines")) + 
  labs(x = 'Age Group', y = 'Population Count')

ggplotly(p, tooltip = c('text')) %>% 
  layout(autosize = F, width = 800, height = 800, 
         title = list(text = paste0('Population Pyramid of Singapore In Mature Estates',
                                    '<br>',
                                    '<sup>',
                                    'Shows changes in Population Demographic by Age Group and Sex from 2000-2020',
                                    '</sup>')),
         margin = list(t = 90)) %>%
  animation_opts(
    frame= 1000, transition = 1000, easing = "linear", redraw = FALSE 
  ) %>% 
  animation_slider(
    currentvalue = list(prefix = "YEAR: ", font = list(color="Black"))
  )

Looking at the 5 mature estates population demographic, in the early 2000s, the population demographic showed younger families living in these estates as shown by the wider bottom and middle part of the age-sex pyramid. However, as the years progress, it can be seen that the top of the pyramid starts to widen and the base becomes narrower. This indicates a growing aging population in these estates. We can also see the widening of the bars going upwards. The total population count for these estates have also remained more or less the same throughout the years as we do not see and expanding chart when compared to the non-mature estates.

We will now look at how the Non-Mature estates’ population demographic has changed through 2000 to 2020.

3.2 Age-Sex Pyramid for Non-Mature Planning Areas

filtered_data_2 <- filter(pop_count, pop_count$PA %in% non_mature_PA)

p <-ggplot(filtered_data_2, aes(x=factor(AG, levels = age_order),
                      y= PopTotal,
                      fill= Sex,
                      frame = Year,
                      text = paste("Age Group: ",AG, '<br>', 'Pop Count: ', abs(PopTotal)))) +
  geom_col(data = subset(filtered_data_2, Sex == 'Males'), position = 'identity', fill='skyblue') +
  geom_col(data = subset(filtered_data_2, Sex == 'Females'), position = 'identity', fill='pink') +
  scale_y_continuous(labels=NULL) +
  scale_x_discrete(labels=NULL) +
  coord_flip() +
  facet_wrap(~PA, scales = 'free_y', ncol = 3) +
  theme(strip.background =element_rect(fill="whitesmoke", colour = 'black'),
        panel.background = element_rect(fill = "white"),
        panel.border = element_rect(colour = "black", fill=NA, size=1),
        panel.spacing.y = unit(1.5, "lines")) + 
  labs(x = 'Age Group', y = 'Population Count')

ggplotly(p, tooltip = c('text')) %>% 
  layout(autosize = F, width = 800, height = 800, 
         title = list(text = paste0('Population Pyramid of Singapore In Non-Mature Estates',
                                    '<br>',
                                    '<sup>',
                                    'Shows changes in Population Demographic by Age Group and Sex from 2000-2020',
                                    '</sup>')),
         margin = list(t = 90)) %>%
  animation_opts(
    frame= 1000, transition = 1000, easing = "linear", redraw = FALSE 
  ) %>% 
  animation_slider(
    currentvalue = list(prefix = "YEAR: ", font = list(color="Black"))
  )

For the non-mature estates, the population count for each age group starts out small and the young and middle aged group grows throughout the years. This is due to young families moving into these new non-mature estates. This growth in population is the greatest in Punggol, Bukit Panjang, Sengkang and Sembawang which are where a great number of new BTO flats have been built. As for Hougang, it is an older estate as compared to the other 4 as can be seen from its larger population numbers in 2000. It is also starting to show the trend seen in mature estates.

Conclusion

Doing an animation for the mature and non-mature estates gives us more in depth information on how the population demographic changes and what these changes are like. The population demographic for the non-mature estates now might start to show the same trend as the mature estates as the residents in these areas start to age since the current mature estates were consider new estates then in 2020.

From this take-home exercise, using R to build animation can be a little frustrating at the beginning. However, after working with it for awhile, it becomes clearer how some of the parameters in the ggplotly function works and how to link the ggplots to plotly.